308 research outputs found
Context-Aware Zero-Shot Recognition
We present a novel problem setting in zero-shot learning, zero-shot object
recognition and detection in the context. Contrary to the traditional zero-shot
learning methods, which simply infers unseen categories by transferring
knowledge from the objects belonging to semantically similar seen categories,
we aim to understand the identity of the novel objects in an image surrounded
by the known objects using the inter-object relation prior. Specifically, we
leverage the visual context and the geometric relationships between all pairs
of objects in a single image, and capture the information useful to infer
unseen categories. We integrate our context-aware zero-shot learning framework
into the traditional zero-shot learning techniques seamlessly using a
Conditional Random Field (CRF). The proposed algorithm is evaluated on both
zero-shot region classification and zero-shot detection tasks. The results on
Visual Genome (VG) dataset show that our model significantly boosts performance
with the additional visual context compared to traditional methods
Dynamic Proposals for Efficient Object Detection
Object detection is a basic computer vision task to loccalize and categorize
objects in a given image. Most state-of-the-art detection methods utilize a
fixed number of proposals as an intermediate representation of object
candidates, which is unable to adapt to different computational constraints
during inference. In this paper, we propose a simple yet effective method which
is adaptive to different computational resources by generating dynamic
proposals for object detection. We first design a module to make a single
query-based model to be able to inference with different numbers of proposals.
Further, we extend it to a dynamic model to choose the number of proposals
according to the input image, greatly reducing computational costs. Our method
achieves significant speed-up across a wide range of detection models including
two-stage and query-based models while obtaining similar or even better
accuracy
DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
Transformer-based detection and segmentation methods use a list of learned
detection queries to retrieve information from the transformer network and
learn to predict the location and category of one specific object from each
query. We empirically find that random convex combinations of the learned
queries are still good for the corresponding models. We then propose to learn a
convex combination with dynamic coefficients based on the high-level semantics
of the image. The generated dynamic queries, named modulated queries, better
capture the prior of object locations and categories in the different images.
Equipped with our modulated queries, a wide range of DETR-based models achieve
consistent and superior performance across multiple tasks including object
detection, instance segmentation, panoptic segmentation, and video instance
segmentation.Comment: 12 pages, 4 figures, ICML 202
Relabeling Minimal Training Subset to Flip a Prediction
When facing an unsatisfactory prediction from a machine learning model, it is
crucial to investigate the underlying reasons and explore the potential for
reversing the outcome. We ask: can we result in the flipping of a test
prediction by relabeling the smallest subset of the
training data before the model is trained? We propose an efficient procedure to
identify and relabel such a subset via an extended influence function. We find
that relabeling fewer than 1% of the training points can often flip the model's
prediction. This mechanism can serve multiple purposes: (1) providing an
approach to challenge a model prediction by recovering influential training
subsets; (2) evaluating model robustness with the cardinality of the subset
(i.e., ); we show that is highly related to
the noise ratio in the training set and is correlated with
but complementary to predicted probabilities; (3) revealing training points
lead to group attribution bias. To the best of our knowledge, we are the first
to investigate identifying and relabeling the minimal training subset required
to flip a given prediction.Comment: Under revie
- β¦